6 research outputs found

    An Integrated Principal Component Analysis And Weighted Apriori-T Algorithm For Imbalanced Data Root Cause Analysis

    Get PDF
    Root Cause Analysis (RCA) is often used in manufacturing analysis to prevent the reoccurrence of undesired events. Association rule mining (ARM) was introduced in RCA to extract frequently occur patterns, interesting correlations, associations or casual structures among items in the database. However, frequent pattern mining (FPM) using Apriori-like algorithms and support-confidence framework suffers from the myth of rare item problem in nature. This has greatly reduced the performance of RCA, especially in manufacturing domain, where existence of imbalanced data is a norm in a production plant. In addition, exponential growth of data causes high computational costs in Apriori-like algorithms. Hence, this research aims to propose a two stage FPM, integrating Principal Component Analysis (PCA) and Weighted Apriori-T (PCA-WAT) algorithm to address these problems. PCA is used to generate item weight by considering maximally distributed covariance to normalise the effect of rare items. Using PCA, significant rare item will have a higher weight while less significant high occurance item will have a lower weight. On the other hand, Apriori-T with indexing enumeration tree is used for low cost FPM. A semiconductor manufacturing case study with Work In Progress data and true alarm data is used to proof the proposed algorithm. The proposed PCA-WAT algorithm is benchmarked with the Apriori and Apriori-T algorithms.Comparison analysis on weighted support has been performed to evaluate the capability of PCA in normalising item’s support value. The experimental results have proven that PCA is able to normalise the item support value and reduce the influence of imbalance data in FPM.Both quality and performance measure are used as performance measurement. The quality measures aim to compare the frequent itemsets and interesting rules generated across different support and confidence thresholds, ranging from 5% to 20%, and 10% to 90% respectively.The rules validation involves a business analyst from the related field. The domain expert has verified that the generated rules are able to explain the contributing factors towards failure analysis. However, significant rare rules are not easily discovered because the normalized weighted support values are generally lower compared to the original suppport values. The performance measures aim to compare the execution time in second (s) and the execution Random Access Memory (RAM) in megabyte (MB). The experiment results proven that the implementation of Apriori-T has lowered the computational cost by at least 90% of computation time and 35.33% of computation RAM as compared to Apriori. The primary contribution of this study is to propose a two-stage FPM to perform RCA in manufacturing domain with the existence of imbalanced dataset. In conclusion, the proposed algorithm is able to overcome the rare item issue by implementing covariance based support value normalization and high computational costs issue by implementing indexing enumeration tree structure.Future work of this study should focus on rule interpretation to generate more human understandable rule by novice in data mining. In addition, suitable support and confidence thresholds are needed after the normalisation process to better discover the significant rare itemset

    Viola sp. (BR0000010811061)

    No full text
    Belgium Herbarium image of Meise Botanic Garden

    Expression analysis of transcript and protein markers that are related to agar yield and gel strength in Gracilaria changii (Rhodophyta)

    No full text
    The supply of agar as an important gelling and thickening agent in various industrial applications depends heavily on harvesting of natural seaweed resources and seaweed farming. To facilitate the selection of good seaweed source with higher agar yield and stronger gel strength, accurate and rapid screening method using molecular markers is necessary to replace the tedious, laborious and time-consuming conventional method which involves agar extraction and gel analysis. In this study, we characterized the expression of a number of algal transcripts and proteins from an agar producing seaweed, Gracilaria changii with the aim to identify potential markers for agar yield and gel strength. In total, 15 candidate transcripts that are directly or indirectly related to putative agar biosynthetic pathway were identified based on literature search. The transcript abundance of 4 and 11 of these candidates were found to be significantly (P < 0.05) correlated to the agar yield and gel strength of six G. changii samples, respectively. Among these marker genes, the transcript levels of GcFBPA and GcGALE have the highest linear correlation to both agar yield and gel strength. The protein abundance of GcFBPA and GcGALE was further examined on 13 G. changii samples and was found to have highly significant (P < 0.01) correlation to agar gel strength and agar yield, respectively. GcFBPA and GcGALE may have good potential to be used for molecular screening of yield traits and gel quality of G. changii at both RNA and protein levels
    corecore